Text Summarization and Singular Value Decomposition

نویسندگان

  • Josef Steinberger
  • Karel Jezek
چکیده

In this paper we present the usage of singular value decomposition (SVD) in text summarization. Firstly, we mention the taxonomy of generic text summarization methods. Then we describe principles of the SVD and its possibilities to identify semantically important parts of a text. We propose a modification of the SVD-based summarization, which improves the quality of generated extracts. In the second part we propose two new evaluation methods based on SVD, which measure content similarity between an original document and its summary. In evaluation part, our summarization approach is compared with 5 other available summarizers. For evaluation of a summary quality we used, apart from a classical content-based evaluator, both newly developed SVD-based evaluators. Finally, we study the influence of the summary length on its quality from the angle of the three evaluation methods mentioned.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-Document Multi-Lingual Automatic Summarization System

Abstract. In this paper, a new multidocument multi-lingual text summarization technique, based on singular value decomposition and hierarchical clustering, is proposed. The proposed approach relies on only two resources for any language: a word segmentation system and a dictionary of words along with their document frequencies. The summarizer initially takes a collection of related documents, a...

متن کامل

Clustered Sub-Matrix Singular Value Decomposition

This paper presents an alternative algorithm based on the singular value decomposition (SVD) that creates vector representation for linguistic units with reduced dimensionality. The work was motivated by an application aimed to represent text segments for further processing in a multi-document summarization system. The algorithm tries to compensate for SVD’s bias towards dominant-topic document...

متن کامل

Dimensionality Reduction Aids Term Co-Occurrence Based Multi-Document Summarization

A key task in an extraction system for query-oriented multi-document summarisation, necessary for computing relevance and redundancy, is modelling text semantics. In the Embra system, we use a representation derived from the singular value decomposition of a term co-occurrence matrix. We present methods to show the reliability of performance improvements. We find that Embra performs better with...

متن کامل

Significant Sentence Extraction by Euclidean Distance Based on Singular Value Decomposition

This paper describes an automatic summarization approach that constructs a summary by extracting the significant sentences. The approach takes advantage of the cooccurrence relationships between terms only in the document. The techniques used are principal component analysis (PCA) to extract the significant terms and singular value decompostion (SVD) to find out the significant sentences. The P...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004